Web Robot Detection in Academic Publishing
نویسندگان
چکیده
Recent industry reports assure the rise of web robots which comprise more than half of the total web trac. ey not only threaten the security, privacy and eciency of the web but they also distort analytics and metrics, doubting the veracity of the information being promoted. In the academic publishing domain, this can cause articles to be faulty presented as prominent and inuential. In this paper, we present our approach on detecting web robots in academic publishing websites. We use dierent supervised learning algorithms with a variety of characteristics deriving from both the log les of the server and the content served by the website. Our approach relies on the assumption that human users will be interested in specic domains or articles, while web robots crawl a web library incoherently. We experiment with features adopted in previous studies with the addition of novel semantic characteristics which derive aer performing a semantic analysis using the Latent Dirichlet Allocation (LDA) algorithm. Our real-world case study shows promising results, pinpointing the signicance of semantic features in the web robot detection problem.
منابع مشابه
Motion detection by a moving observer using Kalman filter and neural network in soccer robot
In many autonomous mobile applications, robots must be capable of analyzing motion of moving objects in their environment. Duringmovement of robot the quality of images is affected by quakes of camera which cause high errors in image processing outputs. In thispaper, we propose a novel method to effectively overcome this problem using Neural Networks and Kalman Filtering theory. Thistechnique u...
متن کاملThe Ontogenesis Knowledgeblog: Lightweight publishing about semantics, with lightweight semantic publishing
The web has moved from a minority interest tool to one of the most heavily used platforms for publication. Despite originally being designed by and for academics, it has left academic publishing largely untouched; most papers are available on-line, but in PDF and are most easily read once printed. Here, we describe our experiments with using commodity web technology to replace the existing publ...
متن کاملA framework for statistical software development, maintenance, and publishing within an open-access business model
There are several fundamental problems with statistical software development in the academic community. In addition, the development and dissemination of academic software will become increasingly difficult due to a variety of reasons. To solve these problems, a new framework for statistical software development, maintenance, and publishing is proposed: it is based on the paradigm that academic...
متن کاملDevelopment of RadRob15, A Robot for Detecting Radioactive Contamination in Nuclear Medicine Departments
Accidental or intentional release of radioactive materials into the living or working environment may cause radioactive contamination. In nuclear medicine departments, radioactive contamination is usually due to radionuclides which emit high energy gamma photons and particles. These radionuclides have a broad range of energies and penetration capabilities. Rapid detection of radioactive contami...
متن کاملGuidelines for selecting journals that avoid fraudulent practices in scholarly publishing
In recent years, scholarly publishing has been faced with many distractive phenomena. Generally, most researchers are unaware of fraudulent practices now common to scholarly publishing and are at risk of becoming a victim of them. Editors also need to have sufficient knowledge about these practices. There are papers that try to increase awareness of authors about fraud in scholarly publishing, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.05098 شماره
صفحات -
تاریخ انتشار 2017